On Analyzing Hashtags in Twitter
نویسندگان
چکیده
Hashtags, originally introduced in Twitter, are now becoming the most used way to tag short messages in social networks since this facilitates subsequent search, classification and clustering over those messages. However, extracting information from hashtags is difficult because their composition is not constrained by any (linguistic) rule and they usually appear in short and poorly written messages which are difficult to analyze with classic IR techniques. In this paper we address two challenging problems regarding the “meaning of hashtags”— namely, hashtag relatedness and hashtag classification — and we provide two main contributions. First we build a novel graph upon hashtags and (Wikipedia) entities drawn from the tweets by means of topic annotators (such as TagME); this graph will allow us to model in an efficacious way not only classic co-occurrences but also semantic relatedness among hashtags and entities, or between entities themselves. Based on this graph, we design algorithms that significantly improve state-of-the-art results upon known publicly available datasets. The second contribution is the construction and the public release to the research community of two new datasets: the former is a new dataset for hashtag relatedness, the latter is a dataset for hashtag classification that is up to two orders of magnitude larger than the existing ones. These datasets will be used to show the robustness and efficacy of our approaches, showing improvements in F1 up to two-digits in percentage (absolute).
منابع مشابه
On Recommending Hashtags in Twitter Networks
Twitter network is currently overwhelmed by massive amount of tweets generated by its users. To effectively organize and search tweets, users have to depend on appropriate hashtags inserted into tweets. We begin our research on hashtags by first analyzing a Twitter dataset generated by more than 150,000 Singapore users over a three-month period. Among several interesting findings about hashtag ...
متن کاملAnalyzing the Dynamic Evolution of Hashtags on Twitter: a Language-Based Approach
Hashtags are used in Twitter to classify messages, propagate ideas and also to promote specific topics and people. In this paper, we present a linguistic-inspired study of how these tags are created, used and disseminated by the members of information networks. We study the propagation of hashtags in Twitter grounded on models for the analysis of the spread of linguistic innovations in speech c...
متن کاملCitation Analysis in Twitter: Approaches for Defining and Measuring Information Flows within Tweets during Scientific Conferences
This paper investigates Twitter usage in scientific contexts, particularly the use of Twitter during scientific conferences. It proposes a methodology for capturing and analyzing citations/references in Twitter. First results are presented based on the analysis of tweets gathered for two conference hashtags.
متن کاملTopic Lifecycle on Social Networks: Analyzing the Effects of Semantic Continuity and Social Communities
Topic lifecycle analysis on Twitter, a branch of study that investigates Twitter topics from their birth through lifecycle to death, has gained immense mainstream research popularity. In the literature, topics are often treated as one of (a) hashtags (independent from other hashtags), (b) a burst of keywords in a short time span or (c) a latent concept space captured by advanced text analysis m...
متن کاملExploring the Meaning behind Twitter Hashtags through Clustering
Social networks are generators of large amount of data produced by users, who are not limited with respect to the content of the information they exchange. The data generated can be a good indicator of trends and topic preferences among users. In our paper we focus on analyzing and representing hashtags by the corpus in which they appear. We cluster a large set of hashtags using K-means on map ...
متن کاملA Semantic Continuity Based Analysis of Topic Lifecycle on Social Networks
Analyzing the lifecycle of topics, that are present in usergenerated text content, has emerged as a mainstream topic of social network research. The literature presently identifies topics on Twitter, a prominent online social network, as either individual hashtags, or a burst of keywords within a short span of time, or as latent concept spaces obtained from sophisticated text analysis mechanism...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015